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LISTING OF CLAIMS: 

1 . (Previously Presented) A method for converting a generic document, wherein a 
generic document comprises a document in a particular format type, into a structured document, 
wherein a structured document includes a plurality of content elements wrapped in pairs of 
hierarchically nested tags, comprising; 

parsing the generic document of the particular format type containing content into a 

plurality of content elements; and 

for a selected content element, suggesting an optimal tag according to a tag suggestion 

procedure; 

wherein the tag suggestion procedure comprises: 

providing sample data in the form of structured sample documents; 

analyzing patterns in the sample data to derive a set of tag suggestions and tag suggestion 

rules; 

deriving a set of candidate tags from the set of tag suggestions for the selected content 
element in accordance with the tag suggestion rules; and 

evaluating the set of candidate tags according to tag suggestion criteria to determine an 
optimal tag for the selected content element. 

2. (Original) The method of claim 1, wherein the tag suggestion criteria 
comprises satisfying a similarity function. 

3. (Original) The method of claim 1, wherein the set of tag suggestions are 
generated during creation of the structured document. 

4. (Original) The method of claim 1, wherein the set of tag suggestions are 
generated prior to creation of the structured document. 

5. (Original) The method of claim 1, wherein the structured sample document 
comprises an XML document having a DTD associated with it. 
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6. (Original) The method of claim 1 , wherein the set of tag suggestions includes 
tree patterns of tags. 

7. (Original) The method of claim 1, wherein the optimal tag maximizes a 
similarity function with patterns found in the sample data. 

8. (Original) The method of claim 6, wherein the tag suggestion criteria 
comprises balancing size of tree patterns of tags and frequency of occurrence of tree patterns of 
tags in the sample data. 

9. (Original) The method of claim 1, wherein the set of tag suggestions includes 
a set of tree patterns of tags u e T, and a set C of candidates is a set of all patterns in T with all 
their prefixes, C - {c \ c is a prefix ofh e T) ; 

wherein a similarity function between a candidate c e C and a tree pattern h e T 
satisfies: sim (c, t,) = \c |/| t,- 1, if c is a tree-prefix of U- 
sim (c. tj) - 0, otherwise; and 

wherein the optimal tag comprises a context-free candidate c e C that maximizes an 
aggregate similarity measure SIM (c,T), where SIM(c,T) = 2V/h(c,0 ■ P r , ■ 

10. (Original) The method of claim 9, wherein a candidate set in context t cxl is 
defined as C(t ca )~ {ceC\t atl is a prefix ofc }; and 

wherein the optimal tag comprises a context-aware candidate c e C that maximizes an 
aggregate similarity measure SIM (c. T), where SIM(c t T) = ^sim(c,t,) ■ pr, . 

11. (Previoxisly Presented) A method for authoring of a structured document, 
wherein a structured document comprises a plurality of content elements wrapped in pairs of 
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tags, comprising: 

generating content elements wrapped in pairs of tags; and 

for a selected tag, suggesting an optimal content fragment according to a content 

suggestion procedure; 

wherein the content suggestion procedure comprises: 

providing a plurality of sample structured documents; 

analyzing the sample structured documents for content fragments; 

deriving a set of content fragments from the sample structured document associated with 
the selected tag; 

evaluating the set of content fragments according to a content fragment suggestion criteria 
to determine an optimal content fragment suggestion for the tag, wherein the optimal content 
fragment suggestion is the most probable content fragment for the selected tag. 

12. (Original) The method of claim 11, further comprising assigning a score to 
each content fragment in the set of content fragments, wherein the score is a ratio of number of 
occurrences of the content fragment under the selected tag and number of occurrences of the 
selected tag in the sample structured document. 

13. (Original) The method of claim 12, wherein the optimal content fragment 
suggestion is the content fragment with the highest score. 

14. (Original) The method of claim 1 2, further comprising assigning a context to 
each content fragment in the set of content fragments, wherein context comprises die structural 
context of the tag surrounding the content fragment. 

15. (Original) The method of claim 12, wherein the optimal content fragment 
suggestion is the content fragment with the highest score greater than a threshold value. 

16. (Original) The method of claim 14, wherein each content fragment is 

4 



PAGE 419 " RCVD AT 61512007 6:10:48 PM [Eastern Daylight Time] * SVR:USPTO-EFXRF-5/14 • DNIS:2738300 ' CSID:+3 1 03336446 ' DURATION (mm-ss):02-50 



06-05-2007 02:13pm Frora-XEROX 

Application No.: 10/607,667 



+3103336446 



T-495 P. 005/009 F-948 



referenced by a partial path from the sample structured document root and the context comprises 
the partial path of the content fragment in the sample structured document. 

1 7. (Original) The method of claim 1 1 , further comprising: 

selecting a small linguistic unit within each content fragment in the set of content 
fragments; and 

assigning a score to the small linguistic unit, wherein the score is a ratio of number of 
occurrences of the linguistic unit under the selected tag and number of occurrences of the 
selected tag in the sample structured document. 

18. (Original) The method of claim 17, wherein the small linguistic unit is a 
word, a phrase or a sentence. 

19 (Original) The method of claim 14, wherein the context of each content 
fragment in the set of content fragments comprises the structural tree around the tag surrounding 
the content fragment. 

20. (Original) The method of claim 1 , wherein content comprises text. 
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